Biostatistics For Dummies (Monika Wahi John Pezzullo)

of asking the same question is: Is being a member of a particular row associated with being a

member of a particular column?

In this chapter, we describe two tests you can use to answer this question: the Pearson chi-square test,

and the Fisher Exact test. We also explain how to estimate power and sample sizes for the chi-square

and Fisher Exact tests.

Like with other statistical tests, you can run all the tests in this chapter from individual-level

data in a database, where there is one record per participant. But the tests in this chapter can also

be executed using data that has already been summarized in the form of a cross-tab:

Most statistical software is set up to work with individual-level data. In that case, your data file

needs to have two columns for the association you want to test: one containing the categorical

variable representing the treatment group (or whatever category is on the y-axis), and one

containing the categorical variable representing the outcome. If you have the correct columns, all

you have to do is tell the statistical software you are using which test or tests you want to run, and

which variables to use in the test.

Most statistical software is also set up so that you can do these tests using summarized data (rather

than individual-level data), so long as you set an option in your programming when running the

tests. In contrast, online calculators that execute these tests expect you to have already cross-

tabulated the data. These calculators usually present a screen showing an empty table, and you

enter the counts into the table’s cells to run the calculation.

Examining Two Variables with the Pearson Chi-

Square Test

The most commonly used statistical test of association between two categorical variables is called the

chi-square test of association developed by Karl Pearson around the year 1900. It’s called the chi-

square test because it involves calculating a number called a test statistic that fluctuates in accordance

with the chi-square distribution. Many other statistical tests also use the chi-square distribution, but the

test of association is by far the most popular. In this book, whenever we refer to a chi-square test

without specifying which one, we are referring to the Pearson chi-square test of association between

two categorical variables. (Please note that some books use the notation X² or x² instead of saying the

term chi-square.)

Understanding how the chi-square test works

You don’t have to understand the equations behind the chi-square test if you have a computer to do

them, which is optimal, though it is possible to calculate the test manually. This means you technically

don’t have to read this section. But we encourage you to do so anyway, because we think you’ll have a

better appreciation for the strengths and limitations of the test if you know its mathematical

underpinnings. Here, we walk you through conducting a chi-square test manually (which is possible to

do in Microsoft Excel).